information projection
- South America > Brazil (0.04)
- North America > Canada > Ontario > Hamilton (0.04)
- Europe > France (0.04)
Beyond Adult and COMPAS: Fair Multi-Class Prediction via Information Projection
We consider the problem of producing fair probabilistic classifiers for multi-class classification tasks. We formulate this problem in terms of ``projecting'' a pre-trained (and potentially unfair) classifier onto the set of models that satisfy target group-fairness requirements. The new, projected model is given by post-processing the outputs of the pre-trained classifier by a multiplicative factor. We provide a parallelizable, iterative algorithm for computing the projected classifier and derive both sample complexity and convergence guarantees. Comprehensive numerical comparisons with state-of-the-art benchmarks demonstrate that our approach maintains competitive performance in terms of accuracy-fairness trade-off curves, while achieving favorable runtime on large datasets. We also evaluate our method at scale on an open dataset with multiple classes, multiple intersectional groups, and over 1M samples.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Overview: The paper proposes a framework for enforcing structure in Bayesian models via structured prior selection based on the maximum entropy principle. Although the optimal prior may not be tractable, the authors developed an approximation method using submodule optimization. Contructing priors with structured variables is an important topic, so this method should be able to make good impact. Quality The paper is technically sound.
- South America > Brazil (0.04)
- North America > Canada > Ontario > Hamilton (0.04)
- Europe > France (0.04)
The Benefits of Balance: From Information Projections to Variance Reduction
Data balancing across multiple modalities and sources appears in various forms in foundation models in machine learning and AI, e.g., in CLIP and DINO. We show that data balancing across modalities and sources actually offers an unsuspected benefit: variance reduction. We present a non-asymptotic statistical bound that quantifies this variance reduction effect and relates it to the eigenvalue decay of Markov operators. Furthermore, we describe how various forms of data balancing in contrastive multimodal learning and self-supervised clustering can be better understood, and even improved upon, owing to our variance reduction viewpoint.
Meta-Dependence in Conditional Independence Testing
Mazaheri, Bijan, Zhang, Jiaqi, Uhler, Caroline
Constraint-based causal discovery algorithms utilize many statistical tests for conditional independence to uncover networks of causal dependencies. These approaches to causal discovery rely on an assumed correspondence between the graphical properties of a causal structure and the conditional independence properties of observed variables, known as the causal Markov condition and faithfulness. Finite data yields an empirical distribution that is "close" to the actual distribution. Across these many possible empirical distributions, the correspondence to the graphical properties can break down for different conditional independencies, and multiple violations can occur at the same time. We study this "meta-dependence" between conditional independence properties using the following geometric intuition: each conditional independence property constrains the space of possible joint distributions to a manifold. The "meta-dependence" between conditional independences is informed by the position of these manifolds relative to the true probability distribution. We provide a simple-to-compute measure of this meta-dependence using information projections and consolidate our findings empirically using both synthetic and real-world data.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > New Hampshire > Grafton County > Hanover (0.04)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > New Finding (0.66)
On Prior Distributions and Approximate Inference for Structured Variables
Oluwasanmi O. Koyejo, Rajiv Khanna, Joydeep Ghosh, Russell Poldrack
We present a general framework for constructing prior distributions with structured variables. The prior is defined as the information projection of a base distribution onto distributions supported on the constraint set of interest. In cases where this projection is intractable, we propose a family of parameterized approximations indexed by subsets of the domain. We further analyze the special case of sparse structure. While the optimal prior is intractable in general, we show that approximate inference using convex subsets is tractable, and is equivalent to maximizing a submodular function subject to cardinality constraints. As a result, inference using greedy forward selection provably achieves within a factor of (1-1/e) of the optimal objective value. Our work is motivated by the predictive modeling of high-dimensional functional neuroimaging data. For this task, we employ the Gaussian base distribution induced by local partial correlations and consider the design of priors to capture the domain knowledge of sparse support. Experimental results on simulated data and high dimensional neuroimaging data show the effectiveness of our approach in terms of support recovery and predictive accuracy.
- North America > United States > New York (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Beyond Adult and COMPAS: Fair Multi-Class Prediction via Information Projection
We consider the problem of producing fair probabilistic classifiers for multi-class classification tasks. We formulate this problem in terms of projecting'' a pre-trained (and potentially unfair) classifier onto the set of models that satisfy target group-fairness requirements. The new, projected model is given by post-processing the outputs of the pre-trained classifier by a multiplicative factor. We provide a parallelizable, iterative algorithm for computing the projected classifier and derive both sample complexity and convergence guarantees. Comprehensive numerical comparisons with state-of-the-art benchmarks demonstrate that our approach maintains competitive performance in terms of accuracy-fairness trade-off curves, while achieving favorable runtime on large datasets.
The Benefits of Balance: From Information Projections to Variance Reduction
Liu, Lang, Mehta, Ronak, Pal, Soumik, Harchaoui, Zaid
Deep neural networks have shown remarkable success at learning task-specific representations of data when provided supervision from massive amounts of labeled training examples. Recent trends, however, have shifted toward taskagnostic, universal representations that may be easily fine-tuned or even have zero-shot capabilities out-of-the-box. Supervised learning, stricto sensu, is too limited a framework for these billion-parameter, data-hungry models, and a question at the heart of modern machine learning is learning from unlabelled, partially labeled, or weakly labeled data. This need has paved the way for the current generation of self-supervised learning (SSL) approaches that circumvent the need for large amounts of strong labels. In SSL, a model is trained on a generic pseudo-task that can be performed on unlabelled data, such as relating the two modalities of an image-caption pair or two augmentations of the same image. Despite several modern foundation models such as DINO (Caron et al., 2021; Oquab et al., 2024) and CLIP (Radford et al., 2021) being trained in this fashion, many aspects of SSL remain baffling. In particular, the training process of self-supervised models often outgrows and "breaks the rules" of the standard empirical risk minimization (ERM) toolkit. ERM combines two well-understood techniques: minibatch sampling and gradient-based optimization using backpropagation. SSL, on the other hand, adds clever, less-understood techniques to the training pipeline.
- Europe > Ireland (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
On Prior Distributions and Approximate Inference for Structured Variables
We present a general framework for constructing prior distributions with structured variables. The prior is defined as the information projection of a base distribution onto distributions supported on the constraint set of interest. In cases where this projection is intractable, we propose a family of parameterized approximations indexed by subsets of the domain. We further analyze the special case of sparse structure. While the optimal prior is intractable in general, we show that approximate inference using convex subsets is tractable, and is equivalent to maximizing a submodular function subject to cardinality constraints. As a result, inference using greedy forward selection provably achieves within a factor of (1-1/e) of the optimal objective value. Our work is motivated by the predictive modeling of high-dimensional functional neuroimaging data. For this task, we employ the Gaussian base distribution induced by local partial correlations and consider the design of priors to capture the domain knowledge of sparse support. Experimental results on simulated data and high dimensional neuroimaging data show the effectiveness of our approach in terms of support recovery and predictive accuracy.
- North America > United States > New York (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)